The analysis highlights the importance of tumor size in breast cancer prognosis, with significant variations across N and T Stages. This emphasizes early detection’s criticality. For Susan G. Komen, enhancing screening and awareness campaigns is essential. Also, the link between larger tumors and increased regional node positivity calls for personalized treatment and comprehensive nodal assessments. By integrating hypothesis testing results, emphasizing tumor size and stage correlation, Susan G. Komen can bolster its role in promoting breast cancer detection, treatment, and research. This strategic focus is pivotal in improving patient outcomes and driving impactful changes in patient care.
From the box plots, N3 stages have the biggest median tumor sizes, which alligns with the expectation that more advanced lymph node involvement can be associated with larger primary tumor. The presence of outliers in all stages also highlights the variability in tumor sizes within each N stage.
From the violin plots, The T3 stage having the highest median tumor size is noteworthy. This suggests that, patients in the T3 stage tend to have larger tumors on average compared to other stages.
## Warning in geom_point(aes(x = mean(Reginol.Node.Positive), y = mean(Tumor.Size)), : All aesthetics have length 1, but the data has 4024 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
## a single row.
## [1] 0.24
## (Intercept) my_data$Reginol.Node.Positive
## 26.30875 1.00165
\[Correlation = 0.24 \text{ (weak positive correlation)}\]
\[y = 1.002x + 26.309 \text{ (where x = Regional Node positivity and y = Tumor size (mm))}\] In summary, there is a weak positive correlation in the data, showing that for each node detected to be positive, tumor size increase approximately 1.002 (mm).
However, judging by the residual plots given here, the linear model is not a great fit as it is not homoscedastic. This might implies weak correlation.
Looking at the boxplots generated, there is a clear trend that patients with distant cancer have larger tumors. To investigate whether this correlation is due to chance or not, we use two sample T-test at the 5% significant level.
Does A stage impact the tumor size of patients?
H: Hypothesis \(H_{0}\) vs \(H_{1}\)
Let \(\mu_{1}\) is the mean of the tumor size of distant in A stage.
Let \(\mu_{2}\) is the mean of the tumor size of regional in A stage.
Thus, \[ H_{0} \text{: There is no difference between } \mu_{1} and \mu_{2}(\mu_{1} = \mu_{2})\]
\[ H_{1} \text{: There is a difference between } \mu_{1} and \mu_{2}(\mu_{1} \neq \mu_{2})\]
##
## Two Sample t-test
##
## data: A1_data$Tumor.Size and A2_data$Tumor.Size
## t = -7.9175, df = 4022, p-value = 3.109e-15
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -21.83660 -13.16857
## sample estimates:
## mean of x mean of y
## 30.07350 47.57609
T (Test Statistic): The observed test statistic is \(t = -7.9175\)
P (P-value): The p-value is \(3.109e-15\)
Statistical conclusion: As the p-value < 0.05, we reject the null hypothesis.
Scientific conclusion: The data suggest that A stage have effect on the tumor size.
Carter, Allen, and Henson (1989), Relation of tumor size, lymph node status, and survival in 24,740 breast cancer cases.
Carter, Allen, and Henson (1989), Accuracy of the extent of axillary nodal positivity related to primary tumor size, number of involved nodes, and number of nodes examined;.
Koscielny et al. (2009), Impact of tumour size on axillary involvement and distant dissemination in breast cancer.
Sievert (2020), Interactive web-based data visualization with R, plotly, and shiny.
Yihui Xie (2023), R Markdown: The Definitive Guide.
Qiu (2021), Creating Pretty Documents from R Markdown.
The report offers substantial value to Susan G. Komen:
Research Advancement: Insights on tumor size variations and regional node positivity provide a deeper understanding of breast cancer progression, potentially guiding future research. This aligns with Susan G. Komen’s research goals, making your report a valuable resource.
Awareness Campaigns: Your emphasis on early detection and the significance of tumor size in prognosis dovetails with their awareness initiatives. The findings could refine their campaigns, highlighting the need for early screening and public education on key indicators.
Treatment Advocacy: The data-driven insights inform advocacy for personalized treatments and comprehensive nodal assessments, supporting patient-centric healthcare policies.
Overall, the report aligns with their mission, enhancing efforts and providing insights to further their impactful work against breast cancer.
Linear modelling was chosen to demonstrate the association between tumor size and regional Node positivity . Assumption tests:
“Eye-test”. The relationship are not linear (Figure 3), however due to with large sample size normality is assumed (\(n = 4024\))
Residual plots (Figure 4): the linear model is not a great fit as it is not homoscedastic. This might implies weak correlation.
Does A stage impact the tumor size of patients?
H: Hypothesis \(H_{0}\) vs \(H_{1}\)
Let \(\mu_{1}\) is the mean of the tumor size of distant in A stage.
Let \(\mu_{2}\) is the mean of the tumor size of regional in A stage.
Thus, \[ H_{0} \text{: There is no difference between } \mu_{1} and \mu_{2}(\mu_{1} = \mu_{2})\]
\[ H_{1} \text{: There is a difference between } \mu_{1} and \mu_{2}(\mu_{1} \neq \mu_{2})\]
Assuming that two sample are independent and they are big enough to present the hole population.
The report assume that the 2 populations have the same variation in tumor size.
##
## F test to compare two variances
##
## data: A2_data$Tumor.Size and A1_data$Tumor.Size
## F = 1.8305, num df = 91, denom df = 3931, p-value = 6.77e-06
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 1.391156 2.512025
## sample estimates:
## ratio of variances
## 1.830517
The report assume that the 2 populations have Normally distributed tumor size.
QQ plots. Generally, the values increase linearly, suggesting normal distribution. Some deviation is observed at the extremities; however, due to the large sample size normality is assumed, as stated in central limit theoreom. (Figure 5, 6)
##
## Two Sample t-test
##
## data: A1_data$Tumor.Size and A2_data$Tumor.Size
## t = -7.9175, df = 4022, p-value = 3.109e-15
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -21.83660 -13.16857
## sample estimates:
## mean of x mean of y
## 30.07350 47.57609
Statistical conclusion:
As the p-value < 0.05, we reject the null hypothesis.
Scientific conclusion:
The data suggest that A stage have effect on the tumor size.
In the assumption, we assume that 2 populations have Normally distributed tumor size, however, from the QQ plots, some deviation is observed at the extremities (Figure 5, 6)
Week correlation between Regional Node Positivity and Tumor Size (Figure 4)
Influence of Other Prognostic Factors
Variability in Measurement